Skip to content

feat: experimental backends - Crawl4AI, Obscura, and Camoufox loaders#1093

Closed
Ege-BULUT wants to merge 1 commit into
ScrapeGraphAI:pre/betafrom
Ege-BULUT:pr/experimental-backends
Closed

feat: experimental backends - Crawl4AI, Obscura, and Camoufox loaders#1093
Ege-BULUT wants to merge 1 commit into
ScrapeGraphAI:pre/betafrom
Ege-BULUT:pr/experimental-backends

Conversation

@Ege-BULUT

Copy link
Copy Markdown
Contributor

Summary

Adds an experimental backends module to ScrapeGraphAI with three alternative document loaders: Crawl4AI (async web crawler with markdown output), Obscura (CDP-based stealth browser), and Camoufox (Firefox fork with C++-level fingerprint spoofing). These backends can be selected via the new experimental key in node_config.

Also improves the core ChromiumLoader with persistent Chrome profile, storage state caching across sessions, and Cloudflare challenge detection with user guidance.

What's included

New experimental backends

  • Crawl4aiLoader - uses Crawl4AI's AsyncWebCrawler for clean markdown/HTML output. Falls back to stealth Playwright + Malenia if Cloudflare blocks the crawl.
  • ObscuraLoader - connects to Obscura browser or a standard Chrome instance via CDP (remote debugging protocol). Supports Docker, subprocess, and Chrome auto-start modes.
  • CamoufoxLoader - starts the Camoufox server via npx, then uses its REST API for tab creation and JS evaluation.

Core improvements

  • Persistent Chrome profile - ChromiumLoader now uses a persistent user data directory and caches storage state (cookies, localStorage) across sessions. This enables browser sessions to survive anti-bot challenges across restarts.
  • Cloudflare detection - when Cloudflare is detected, a clear message guides users to solve the challenge once in non-headless mode (cookies are reused automatically afterward).
  • Backend switching - FetchNode accepts a new experimental config key to route to the selected backend loader.

Dependency extras

pip install scrapegraphai[experimental-crawl4ai]
pip install scrapegraphai[experimental-obscura]

Camoufox does not require a Python extra (it runs via npx).

Usage

graph_config = {
    "llm": {"model": "openai/gpt-4o-mini", "api_key": "..."},
    "experimental": {
        "backend": "crawl4ai",
        "crawl4ai": {"headless": True, "output_format": "markdown"}
    }
}

Notes

  • The Camoufox loader requires Node.js (npx) - see https://github.com/jo-inc/camofox-browser
  • The Obscura loader requires either Docker, the Obscura binary, or a Chrome instance with remote debugging
  • Existing tests pass; e2e tests require pytest -m e2e (network access)

Add experimental backends module providing alternative document loaders
that can be selected via the node_config 'experimental' key.

Backends included:
- Crawl4aiLoader: async web crawler with markdown/HTML output
- ObscuraLoader: CDP-based stealth browser via Obscura or Chrome
- CamoufoxLoader: Firefox fork with C++-level fingerprint spoofing

Also:
- Add persistent Chrome profile and storage state caching to ChromiumLoader
- Add Cloudflare challenge detection with user guidance
- Add pytest e2e marker for network-dependent tests
- Add optional dependency groups: experimental-obscura, experimental-crawl4ai
- Support backend switching in FetchNode via node_config['experimental']
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jun 23, 2026
@VinciGit00 VinciGit00 closed this Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants